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1.  Introduction 


In  [1],  Valiant  introduced  a  rich  framework  for  the  analysis  of  algorithms  that  learn  to  approximate 
sets  from  randomly  chosen  elements  within  and  without  the  sets.  This  framework  and  'ts  extensions  has 
been  analyzed  by  a  number  of  authors,  [2,  3,  4,  5]  amongst  others.  In  this  paper,  we  present  a  new 
framework  concerning  algorithms  that  learn  to  solve  problems  approximately,  instances.  Early  steps  in 
this  direction  were  taken  in  [4],  In  a  sense,  this  can  be  viewed  as  learning  to  improve  computational 
efficiency  as  opposed  to  concept  learning  in  the  sense  of  Valiant.  We  believe  that  this  is  an  important 
new  direction  in  the  formal  theory  of  learning. 

Consider  the  problem  of  symbolic  integration.  Given  the  definition  of  the  problem  and  a  standard 
table  of  integrals,  we  have  complete  information  on  how  to  solve  the  problem.  Yet,  although  we  are 
capable  of  solving  instances  of  symbolic  integration  immediately,  we  are  by  no  means  efficient  in  our 
methods.  It  appears  that  we  need  to  examine  sample  instances,  study  solutions  to  these  instances,  and 
based  on  these  solutions  build  up  a  set  of  heuristics  that  will  enable  us  to  solve  the  problem  fast.  In  this 
sense,  the  learning  process  has  helped  improve  our  computational  efficiency.  Similarly,  given  some  other 
problem,  say  Rubik’s  cube,  and  the  instructions  concerning  its  solution,  we  would  like  to  become 
proficient  at  it  just  as  quickly.  In  essence,  we  would  like  to  behave  in  the  following  manner;  given  the 
specification  of  a  problem,  we  quickly  learn  to  be  efficient  at  solving  the  problem.  Stated  more  abstractly: 
Consider  a  class  of  problems,  such  that  each  problem  in  the  class  is  known  to  possess  an  efficient 
algorithm.  We  are  interested  in  a  meta-algorithm  for  the  class  -  an  algorithm  that  takes  as  input  the 
specification  of  a  problem  drawn  from  the  class  as  well  as  sample  instances  of  the  problem,  and  produces 
as  output  an  efficient  algorithm  for  the  problem.  As  we  will  see,  the  sample  instances  play  a  crucial  role  in 
the  process,  as  in  their  absence,  constructing  an  algorithm  for  the  input  problem  can  be  computationally 
Intractable.  In  this  paper,  we  are  interested  in  examining  learning  in  the  aforementioned  sense. 
Specifically,  we  inquire  into  the  conditions  under  which  such  learning  is  possible.  Our  methods  of 
analysis  are  probabilistic  in  flavour,  akin  to  those  of  Valiant  [1]. 

In  Section  2,  we  present  a  formal  definition  of  the  learning  framework.  The  framework  formalizes 
learning  in  the  above  sense,  demanding  that  the  learner  learn  to  solve  a  problem,  given  a  source  of 
randomly  chosen  solved  instances  of  the  problem.  We  prove  a  theorem  identifying  conditions  sufficient  to 
allow  such  learning.  In  Section  3,  we  consider  an  ^plication  of  our  theorem  to  a  restricted  version  of 
symbolic  integration.  In  particular,  we  show  how  to  construct  an  algorithm  that  is  capable  of  learning  to 
solve  such  restricted  Classes  of  integrals  from  randomly  chosen  examples.  In  Section  4,  we  change  the 
source  of  sample  instances  to  one  that  provides  unsolved  instances  that  are  chosen  in  a  random  but 
slightly  benevolent  manner.  Specifically,  rather  than  present  the  learning  algorithm  with  randomly  chosen 
solved  instances  of  the  problem,  the  learning  algorithm  is  only  allowed  randomly  chosen  "exercises"  on 
the  problem  -  unsolved  instances  of  the  problem,  chosen  according  to  a  probability  distribution  measuring 
their  importance  to  the  learner.  This  is  very  much  the  same  as  the  exercises  in  a  work-book,  such  as  one 
might  find  at  the  end  of  a  book  dealing  with  say  symbolic  integration  or  differential  equations.  We  are 
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able  to  prove  that  the  conditions  sufficient  for  learning  from  solved  instances  are  sufficient  for  learning 
here  as  well.  The  proof  is  constructive  in  that  we  give  a  general  learning  algorithm  that  learns  by  solving 
the  exercises,  solving  them  in  order  of  least  difficult  to  most  difficult.  This  theorem  constitutes  our  main 
result. 


2.  Learning  From  Solved  Instances 

Let  I  be  the  {0,1 )  boolean  alphabet. 

Defn:  A  problem  D  is  the  pair  (G,  O),  where 

(a) The  goalG.V  ^  (0,1]  is  function  from  Z*fo  (0,1)  computable  in  polynomial  time. 

(b) 0  is  a  finite  set  of  operators  (oi.  02,—]  where  each  o-.V-*V  is  a  function  computable  in 
polynomial  time. 

A  specification  of  a  problem  D  =  (G,0)  is  a  set  of  programs  for  G  and  0  that  run  in  polynomial  time. 

Defn;  We  say  an  instance  x  e  I*  of  a  problem  D  =(C,0)  is  solvable  if  there  exists  a  sequence  of 
operators  o  such  that  G(cs(x))  =  1.  The  sequence  o  is  a  solution  sequence  for  x.  The  sequence  length  of 
a  solution  sequence  is  the  number  of  operator  applications  in  it,  i.e.,  the  length  of  a  =  lot.  Unless 
demanded  by  context,  we  use  the  term  length  to  refer  to  the  sequence  length  of  a  solution  sequence.  A 
solution  sequence  a  is  optimal  tor  x  it  its  length  is  as  short  as  that  of  any  solution  sequence  for  x. 

Defn:  Let  o  =  PxP-^py-p,  be  a  solution  sequence  to  x,  where  the  />,  are  operators  in  o.  We  say  x, 
P,ix),  P,.xP,(x),...  are  steps  in  the  solution  of  x  and  that  p,(x),  P,_iP,(.x),...  are  intermediate  steps  in  the 

solution  of  X.  The  step-length  of  <j  with  respect  to  x  is  the  maximum  of  (W,  ip,(x)l,  . },  i.e.,  it  is 

the  length  of  the  longest  instance  encountered  in  using  a  to  solve  x. 

Defn:  An  algorithm  for  a  problem  D  is  a  program  that  takes  as  input  a  string  xe  V  and  produces  as 
output  a  solution  sequence  forx,  if  such  exists. 

A  family  of  problems  M  is  simply  any  set  of  problems.  We  are  interested  in  an  algorithm  that  is 
useful  over  a  family  of  problems,  in  that  it  is  capable  of  learning  to  solve  any  of  the  problems  in  the  family. 
To  this  end,  we  define  the  notion  of  a  meta-algorithm  for  a  family.  Loosely  speaking,  a  meta-algorithm  for 
a  family  M  is  an  algorithm  that  takes  as  input  the  specification  of  a  problem  D  in  M  and  attempts  to 
construct  an  algorithm  for  D.  Given  the  scope  of  our  definition  of  a  family  of  problems,  it  is  easy  to  see 
that  the  task  of  the  meta-algorithm  will  be  A^P-hard  for  most  non-trivial  families.  See  [4],  This  is  true,  even 
if  we  guarantee  that  every  problem  in  the  family  has  a  polynomial-time  algorithm  -  the  difficulty  lies  in 
finding  such  an  algorithm,  given  the  specification  of  the  problem.  In  order  to  reduce  this  complexity  and 
thereby  aid  the  meta-algorithm  in  its  task,  we  provide  the  meta-algorithm  with  sample  instances  of  the 
problem  specified  in  its  input.  Specifically,  we  consider  two  distinct  sources  of  such  sample  instances, 
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one  providing  the  meta-algorithm  with  randomly  chosen  solved  instances,  and  the  other  providing 
unsolved  instances  that  are  randomly  chosen,  although  in  a  slightly  more  benevolent  manner  than  the 
first  source.  The  first  source  is  the  simpler  to  analyze  and  will  be  the  subject  of  the  remainder  of  this 
section.  The  second  source  is  considered  in  Section  4. 

We  place  at  the  disposal  of  the  mera-algorithm  a  subroutine  INSTANCE  which  acts  as  a  random 
source  of  solved  instances.  We  may  view  INSTANCE  as  a  black  box  with  a  button,  such  that  at  each 
push  of  the  button,  INSTANCE  outputs  a  randomly  chosen  solved  instance  of  the  input  problem  D. 
Specifically,  at  each  call,  INSTANCE  returns  a  pair  (jc.a).  The  string  xe  V  is  randomly  drawn  according  to 
an  arbitrary  and  unknown  probability  distribution  P  on  I*.  The  operator  sequence  cr  is  a  randomly  chosen 
optimal  solution  sequence  or  x,  being  the  null-sequence  if  x  is  not  solvable  or  if  x  is  solved  as  it  is.  By 
randomly  chosen,  we  mean  that  at  any  stage  in  the  solution  of  x,  the  next  operator  used  by  INSTANCE  is 
picked  randomly  from  among  those  that  are  useful.  In  order  to  make  this  precise,  we  need  the  following 
definition. 

Oefn:  Let  D=(G,0)  be  a  problem.  For  each  operator  o€  O,  consider  the  set 
U{o)  =  (jd  3  an  optimal  solution  of  the  form  a  o  forx) 

We  call  U{o)  the  projection  of  o,  and  £/(D)  =  {U(,o)\o  e  D)  the  projections  of  D. 

For  any  x  in  I*,  let  0^  be  the  set  of  operators  useful  on  x,  i.e., 

0^-  [o\  oe  0,xe  U(o)}. 

When  solving  j:,  the  first  operator  used  by  INSTANCE  is  picked  at  random  from  0^.  Specifically,  if  there 
are  p  operators  in  0^,  each  is  picked  with  probability  \lp\  Similarly,  the  second  operator  is  picked  at 
random  from  Oy,  where  y  is  the  result  of  applying  the  first  operator  to  x.  And  so  on. 

With  these  definitions  in  hand,  we  attempt  to  make  precise  our  notion  of  a  meta-algorithm.  In 
essence,  a  meta-algorithm  A  for  a  family  of  problems  M  will  take  as  input  an  error  parameter  h  and  the 
specification  of  a  problem  D  in  Af.  A  will  then  compute  lor  time  polynomial  in  various  parameters  and 
output  a  program  //  that  efficiently  approximates  an  algorithm  for  D.  By  this  we  mean  that  we  mean  that 
H  will  behave  like  an  algorithm  tor  D  with  probability  (l-lA).  A  formal  definition  follows. 

Defn:  An  algorithm  A  is  a  meta- algorithm  tor  a  family  of  problems  M  if  there  exists  an  integer  k  such 
that 

(a) A  takes  as  input  an  integer  h  and  the  specification  of  a  problem  D  M.  Let  l  be  the  string  length 
of  this  input. 

(b) A  may  call  INSTANCE.  INSTANCE  returns  examples  forD,  chosen  according  to  some  unknown 
distribution  P  over  Z*.  Let  n  be  the  longest  step-length  and  m  the  longest  sequence  length  of  the 
solutions  so  provided  by  INSTANCE.  For  inputs  of  length  n,  let  r(n)  be  the  sum  of  the  running 


'It  Is  sufficient  if  each  is  picked  with  probability  at  least  \lpoly(n).  where  n  =W  and  poly(n)  denotes  a  polynomial  in  n 


4 


times  of  the  programs  in  the  specification  of  D.  A  computes  in  time  {ihmt(n))'^,  i.e.,  in  time 
polynomial  in  the  length  of  its  input  I,  the  error  parameter  h  and  the  time  required  to  evaluate  the 
programs  in  the  specification  of  D  on  the  examples  seen,  a  may  be  a  randomized  algorithm. 

(c)For  all  De  \1  and  all  distributions  P  over  I*,  with  probability  (I-I//1)  A  outputs  a  (possibly 
randomized)  program  //  that  runs  in  time  on  inputs  of  length  n  and  approximates  an  algorithm 
for  D  in  the  sense  that 

^  P(x)  <  l/h 
X  s  S 

where  S’  =  (xi  H  fails  on  xj 

Since  /.  nay  be  randomized,  by  "H  fails  on  x".  we  mean  that  //  fails  to  solve  x  with  probability 
greater  than  1/2,  although  x  is  solvable. 


We  now  inquire  into  the  conditions  under  which  a  family  of  problems  posseses  a  meta-algorithm. 
Theorem  1  identifies  conditions  sufficient  to  guarantee  the  existence  of  a  meta-algorithm.  Necessary 
conditions  appear  to  be  much  harder  to  obtain,  perhaps  requiring  a  greater  understanding  of  learning  with 
"advice"  as  explored  in  [4).  The  statement  and  proof  of  Theorem  1  are  based  on  previous  results  on 
learning  sets  with  one-sided  error  [3].  These  results  are  reviewed  briefly  in  Appendix  A.  We  refer  the 
unfamiliar  reader  to  that  section  before  proceeding  to  the  theorem. 


Theorem  1 :  A  family  of  problems  M  possesses  a  meta-algorithm  if  there  exists  a  family  of  sets  F 
such  that 

{a)F  contains  the  projections  of  every  problem  D  in  M. 

{b)F  is  polynomial-time  leamable  with  one-sided  error.  (See  Appendix  A  for  details.) 


Proof:  (sketch)  For  a  given  problem  D,  if  we  can  test  membership  in  the  projections  of  D  efficiently, 
(hen  we  can  construct  an  efficient  algorithm  forD.  The  following  is  such  an  algorithm. 

input  x:  string; 
begin 

<y  <—  null-sequence  ; 

While  G(x)  *  1  do 

pickoe  G  such  that  xe  U{o)'. 

if  no  such  exists,  haft;  — x  is  not  solvable— 

X  «—  o(x), 

O  0  (3  ', 

end 

output  (3  as  solution  for  x: 
end 

The  key  idea  in  the  proof  is  as  follows:  Given  a  problem  D,  the  meta-algorithm  will  construct 
approximations  to  the  projections  of  D  using  the  solved  instances.  It  will  then  substitute  these 
approximations  in  the  above  algorithm  to  obtain  an  approximate  algorithm  for  D.  If  the  conditions  of  the 
theorem  are  satisfied,  this  can  be  carried  out  in  random  polynomial-time,  yielding  a  good  approximation  of 
an  algorithm  forD. 


The  rest  of  the  proof  deals  with  the  details.  Specifically,  we  will  exhibit  a  meta-algorithm  for  m.  We 
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need  the  following  definition.  Let  D  be  a  problem  in  M.  We  define  the  quantity  /^(n)  to  be  the  set  of  all 
instances  in  D  that  possess  optimal  solutions  of  step-length  less  than  n. 

ipin)  =  (j±t  has  an  optimal  solution  in  D  of  step-length  at  most  n] 

When  the  problem  D  is  clear  from  the  context,  we  will  simply  write  l{n).  Also,  for  5e  (0.1)  define  the 
quantity  as  the  least  integer  n  such  that 
^  P{x)  >  1-5 

xe  l(n) 

That  is,  rtj  is  the  least  integer  such  that  the  probability  of  occurrence  of  an  optimal  solution  of  step-length 
greater  than  is  less  than  5.  In  what  follows,  we  will  arrange  for  the  meta-algorithm  to  learn 
approximations  to  the  projections  of  D  that  are  good  for  strings  of  length  or  less,  for  a  value  of  5  that 
be  appropriately  chosen. 

Let  f  be  a  family  as  in  the  statement  of  the  theorem.  By  Theorem  A  of  Appendix  A,  F  must  possess 
a  polynomial-time  ordering  Q.  We  use  Q  to  construct  a  meta-algorithm  A  for  M  as  shown  below.  The 
algorithm  uses  Q  to  construct  good  approximations  for  the  projections  of  D  and  then  uses  these 
projections  to  build  an  algorithm  iorD. 
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Meta- Algorithm  A, 

Input  h,  D={G,0) 

Let  F  be  of  dimension  d(n): 

LetO=  {o,l(=  L.ij: 

Let  5(0,). ..^(op,  V  (t),),...V'(cip  be  sets,  initially  empty; 

begin 

Section  1 : 

—This  section  estimates  with  confidence  (l-l/3/i)— 
call  INSTANCE  Mi  losiilh)  times. 

Let  n  be  the  longest  step-length  amongst  those  seen. 
Section  2: 

—This  section  generates  examples  for  projections . 

repeat  l,hikd(n))+iogOh))  times 
call  INSTANCE  to  obtain  (jc.o); 
let  a  be  the  sequence  ; 

S(o^^)  <-  5((?_^^)u{jc) 

5(0^  )  <-  S{o  )u(o^  ,-0;c,(-r)) 

r  -‘r  r-1  1 

end 

Section  3: 

—This  section  constructs  approximations  of  projections— 
repeat  times 

V(0,)  <-  Q(5(o,)): 

if  Q  is  randomized,  repeat  to  confidence  of  l-l/3/i; 

end 

Section  4: 

Output  the  following  as  an  approximate  algorithm  forD 
Algorithm  H 
input  x:  string; 
begin 

a  4-  null-sequence  ; 

While  G(x)  *  1  do 

let  =  {ciLc  e  V{o)] ; 
if  is  empty  then  halt 

else  pick  o  in  uniformly  randomly. 

X  4—  oix)', 

<3  4—  o  a; 

end 

output  a  as  solution  for  x: 

end 

end 


We  now  show  that  the  above  is  indeed  a  meta-algorithm  for  M.  Consider  Section  1  of  the  algorithm. 
We  need  to  show  that  drawing  3h  logOh)  instances  will  produce  a  step-length  m  such  that  n.  For 

any  single  call  of  INSTANCE,  the  probability  of  a  step-length  of  less  than  occurring  is  (1-1/3*)  by 
definition.  In  t  calls  of  INSTANCE,  the  probability  of  ail  the  step-lengths  being  less  than  is  hence 
(1-1/3*)'.  We  only  need  pick  t  such  that 
{\-mhy  ^  1/3* 
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Which  inequ?"'  is  satisfied  by  choosing  t  = 

We  will  consider  Sections  2,  3,  and  4  of  the  algorithm  simultaneously  With  respect  to  strings  of 
'ength  n  or  less,  each  set  V'loi  can  be  chosen  in  ways  m  Section  3  of  the  algorithm.  Hence,  the 
number  of  distinct  algorithms  that  can  be  constructed  m  Section  4  is  Let  .S'  be  the  set  of  algorithms 
so  constructible.  If  n  >  at  least  one  of  these  algorithms  will  approximate  an  algorithm  for  D  within 
1  'h  This  IS  because  the  statement  of  the  theorem  demands  that  F  contains  the  projections  of  D  Now, 
the  aim  of  Sections  2  and  3  is  to  eliminate  those  algorithms  m  S  that  are  bad  approximations  Consider 
algorithms  m  ,s  that  do  not  approximate  an  algorithm  for  D  within  1,3,':.  Call  such  algorithms  "bad''  The 
probability  that  a  particular  bad  algorithm  will  correctly  solve  a  randomly  chosen  instance  is  (1-I'?^i.  and 
the  probability  that  the  algorithm  will  correctly  solve  all  of  r  randomly  chosen  instances  is  i  l-l/3/i)T  The 
probability  that  any  bad  algorithm  in  S  will  correctly  solve  r  random  instances  is  at  most  To 

eliminate  all  bad  algorithms  in  5  with  confidence  (1-1/3^),  we  only  need  to  make  the  above  quantity  less 
than  l/3>!  That  is, 

'3!|  1-1/3 /ir  <  1/3>I 

Since.  i-Si  <  F  and  \F  i  <  we  have. 

'I  ^ 

'11-1/3/1,'  <  1/3/, 
or 

r  >  IhMym)  ■¥  loqylh)). 

This  is  exactly  the  number  of  instances  employed  by  Sections  2  and  3  to  eliminate  the  bad  algorithms  in 
S.  Since  Sections  1.  2  and  3  are  each  carried  out  to  a  confidence  of  (l-l/3/i),  the  overall  confidence  is 
I  \  Furthermore,  the  elimination  of  bad  algorithms  from  s  constnjcts  an  algorithm  that  approximates 
an  algonthm  for  D  within  (2/3/i).  This  is  so  because  the  best  approximation  within  5  need  only  be  within 
l;3>i  owing  to  our  choice  of  m,  and  the  elimination  process  will  construct  an  algorithm  within  i/3/i  of  this 
best  algorithm. 

In  all.  with  probability  (l-lZ/i)  the  meta-algorithm  constructs  an  algorithm  for  the  input  problem  D  that 
is  within  2/3/1  in  accuracy.  Hence,  A  is  a  meta-algorithm  for  \f  and  the  theorem  is  proved  • 

3.  An  Application  to  Symbolic  Integration 

In  this  section  we  discuss  an  aoplication  of  Theorem  1  to  the  domain  of  symbolic  integration.  There 
have  been  reports  in  the  Al  literature  of  programs  that  learn  to  carry  out  restricted  forms  of  symbolic 
integration.  See  [6)  for  instance.  We  will  show  how  this  can  be  achieved  by  a  straightforward  application 
of  Theorem  1 


Consider  the  class  of  integrals  that  can  be  solved  by  the  following  standard  integrals. 
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I 


! 


kflx  xix  =  it  J 

Jlx)dx 

f{x)-gix)dx  =  J 

f{x)dx  - 

fi  X^  +  gi  x)dx  =  J 

f(x)dx  + 

.x"dx=  — 

n+l 

1 

sinxdx  =  -cosx 

cosxdx  =  -itnx 

udiy)  =  uv  -  J 

vd(u) 

! 


g(x)clx 

g(x)dx 


J 

Suppose  we  wish  to  construct  an  algorithm  that  can  solve  this  class  of  integrals. 


Consider  the  following  grammar  r. 
prob  ->  J  exp  var  I  d(exp) 

exp  — >  term  \term  +  expl  term  -  exp  I  term  /  term  I 

term  ightarrow  p-term  I  p~term  *  term 

p-term  — »  const  var  I  -  term  I  trig  power  prob  I  exp 

power  — >  var  **  term 

trig  -*  SIN  var  I  COS  var 

const  —*  int\  a  \  k 

var  j:  I  >•  I  2 

int  rightarrow  1I2I3I4I5I6I7I8I9I0 

This  grammar  generates  a  superset  of  the  strings  that  will  be  seen  as  input  to  the  integration  algorithm. 

Let  a  be  any  sentential  form  in  the  grammar  r.  Define  Lio.)  to  be  the  set  of  strings  derivable  in  r  from  a. 

That  is, 

Lia)  =  (jda  ->px). 

Let  F  be  the  family  of  all  such  sets,  i.e., 

F  =  [L(a)l  a  is  a  sentential  form  in  T). 


It  is  easy  to  see  that  F  is  polynomial-time  learnable  with  one  sided  error.  To  do  so,  we  only  need 
invoke  Theorem  A  of  Appendix  A  and  check  that  (a)  F  is  closed  under  intersection.  We  show  the 
equivalent  condition  [3]  that  for  any  set  of  strings,  there  exists  a  "least"  sentential  form  that  generates 
them.  By  least,  we  mean  that  any  other  sentential  form  that  generates  tnese  strings  will  be  a  super  set  of 
the  least  sentential  form.  To  see  this,  given  a  set  of  strings  we  can  efficiently  compute  the  least  sentential 
form  that  generates  them  as  follows.  Construct  the  parse  trees  for  these  strings  in  r,  and  then  march  up 
these  parse  trees  simultaneously  to  pick  off  points  common  to  all  of  them.  Since  the  parse  trees  are 
unique  in  r,  the  claim  follows,  (b)  F  posseses  a  polynomial-time  ordering.  Indeed,  we  will  exhibit  a 
deterministic  linear  time  ordering  for  F.  For  any  set  of  strings,  compute  the  least  sentential  form  that 
generates  them  as  described  above.  Once  we  have  this  least  sentential  form,  it  is  a  simple  matter  to 
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output  a  program  that  recognizes  strings  that  can  be  generated  from  it.  (c)  Since  the  number  of  sentential 
forms  of  length  n  is  at  most  c"  for  some  constant  c.  F  is  of  dimension  n-lo^(c). 

We  now  hope  that  F  contains  the  projections  of  all  the  standard  integrals  listed  earlier.  (To  be 
honest,  it  does  contain  them.)  We  can  then  invoke  the  meta-algorithm  of  Theorem  l,  and  provide  it  with 
randomly  chosen  solved  instances  of  these  integrals.  By  Theorem  1 ,  the  output  of  the  meta-algorithm  will 
indeed  be  a  good  algorithm  for  the  class  of  integrals  in  question.  Tadepalli,  in  [4]  implemented  this 
algorithm  and  verified  this  to  be  the  case. 

4.  Learning  From  Exercises 

In  the  foregoing,  we  considered  a  model  of  learning  wherein  the  external  agent  INSTANCE  provided 
solved  instances  of  the  problem  of  interest.  In  this  section,  we  consider  a  model  of  learning  wherein  the 
external  agent  provides  unsolved  instances  of  the  problem  of  interest,  although  these  instances  are 
chosen  a  little  more  carefully  than  in  the  previous  model.  The  unsolved  instances  are  exercises,  in  much 
the  same  sense  as  those  that  may  be  found  at  the  end  of  a  text  book  on  symbolic  integration.  Note  that 
the  exercises  in  the  back  of  the  book  are  not  representative  of  the  "natural"  distribution  of  problem 
instances,  but  are  chosen  to  reinforce  the  techniques  required  to  solve  them.  In  this  section,  we  formalize 
the  notion  of  learning  from  exercises  and  prove  a  theorem  similar  to  that  of  Theorem  1 . 

We  now  replace  the  routine  INSTANCE  of  the  previous  section  with  a  routine  EX,  The  key  idea  is  to 

provide  the  learning  algorithm  with  a  source  of  unsolved  instances  of  varying  difficulty.  This  will  permit 

the  learning  algorithm  to  consider  increasingly  difficult  instances,  improving  its  capabilities  as  it 

progresses.  Let  P  be  a  probability  distribution  on  i*,  and  let  INSTANCE  be  defined  according  to  P  as 

described  earlier.  We  can  best  describe  EX  in  terms  of  INSTANCE,  as  shown  below.  In  essence,  EX 

takes  as  argument  an  integer  l  and  returns  an  instance  x  such  that  the  optimal  solution  of  x  has  length  /. 

The  probability  that  a  particular  instance  x  will  be  returned  by  any  call  of  EX  is  the  probability  that  x  will  be 

used  in  a  solution  by  INSTANCE.  This  is  a  measure  of  the  importance  of  knowing  how  to  solve  x,  with 

respect  to  the  natural  distribution  p. 

function  EX(0 
begin 

call  INSTANCE  to  obtain  (x.o): 

if  lal  <  /,  output  the  null  instance. 

else 

let  cr  =  a, 02’  where  lOil  =  / 

output  a,(x). 

end 


■■78  now  define  the  notion  of  a  meta-algorithm  for  a  family  of  problems  in  this  setting.  This  defin.non 
is  largely  identical  to  that  of  Section  2,  except  for  the  use  of  EX  instead  of  INSTANCE. 
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Defn:  An  algorithm  ,a  is  a  meta-algorithm  ior  a  family  of  problems  M  if  there  exists  an  inteqe-  such 

that 

!a).A  takes  as  i^put  integer  h  and  the  specification  of  a  problem  D  e  M.  Let  /  be  the  string  length  of 
this  input 

(b).A  may  ca.i  EX  EX  returns  instances  of  D  drawn  according  to  some  unknown  distribution  P  over 
I*  Let  n  be  the  least  integer  such  that  all  the  instances  so  produced  by  EX  are  in  Hn).  and  let  m 
be  the  largest  integer  used  as  argument  to  EX.  For  inputs  of  length  n,  let  the  sum  of  the  running 
times  of  the  programs  in  the  specification  of  D  be  t{n).  .A  computes  for  time  less  than  Uhman))^. 

I  e.,  in  time  polynomial  in  the  length  of  its  input  /,  the  error  parameter  h.  the  length  m  of  the  optimal 
solutions  of  the  instances  seen,  and  the  time  required  to  evaluate  the  programs  m  the 
specification  of  D  on  the  instances  seen,  a  may  be  a  randomized  algorithm. 

{c)For  all  De  M  and  all  distributions  P  over  I*,  with  probability  (\-\lh)  .A  outputs  a  (possibly 
randomized)  program  //  that  ruhs  in  time  u(r))*  on  inputs  of  length  r  and  approximates  an 
algorithm  for  D  in  the  sehse  that 

^  P{,x)  <  l.n 
JT  6  5 

where  S  =  {.cl  H  fails  on  x) 

Since  H  may  be  randomized,  by  "H  fails  on  x",  we  mean  that  //  fails  to  solve  x  with  probability 
greater  than  1/2,  although  x  is  solvable. 


We  now  inquire  into  the  conditions  under  which  a  family  of  problems  possesses  a  meta-algorithm  in 
this  model  As  it  happens,  the  theorem  we  prove  for  this  model  is  identical  in  its  statement  to  Theorem  1 . 


Theorem  2:  A  family  of  problems  M  possesses  a  meta-algorithm  if  there  exists  a  family  of  sets  F 
such  that 

(a)f  contains  the  projections  of  every  problem  D  in  m. 

{b)F  is  polynomial-time  leamable  with  one-sided  error.  (See  Appendix  A  for  details.) 

Note  that  this  pertains  to  the  model  wherein  the  meta-algorithm  seeks  unsolved  instances  from 
EXERCISE. 


Proof:  (Sketch)  The  key  idea  in  this  proof  is  similar  to  that  of  Theorem  l  -  the  meta-algorithm 
constructs  approximations  to  the  projections  of  D.  The  catch  is  that  it  must  provide  solutions  to  the 
instances  on  its  own.  To  do  so,  the  meta-algorithm  iteratively  learns  to  solve  problems  with  increasingly 
longer  solution  sequences.  Specifically,  the  meta-algorithm  first  learns  to  solve  problems  with  solution 
sequences  of  length  one.  Knowing  how  to  solve  problems  with  solution  sequences  of  length  i,  it  learns  to 
solve  problems  with  solutions  of  length  f-t-l.  In  order  to  describe  such  an  algorithm,  we  need  the  following 
definition. 

Defn:  For  D  e  M  and  56  (0,1)  define  the  quantity  mg  to  be  the  least  integer  such  that 
^  P(x)  >  1-5 

xe  5 


where  S  =  (xlx  has  a  solution  of  length  m  or  less  in  D  | . 
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Meta-Algorlthm 
input /i,D=(C,(9) 

Let  F  be  of  dimension  d{n)\ 

LetO=  (0,1  J= 

Let  S(o,),...5(op,  V{o^),...V{oi^  be  sets,  initially  empty; 

begin 

Section 

let  a  =  1/4/1. 

Estimate  m>m„  to  a  confidence  of  (l-a). 

Let  e  =  l/(2/im-). 

Estimate  n>n^  to  a  confidence  of  (i-e). 

Substitute  the  null  sets  for  the  V'(o)'s  in  the  algorithm  of  Section  3 
to  obtain  the  algorithm  //g. 

Section  2: 
for/=  l,2,....mdO 

pick  /,  such  that  illn{t)  >  \lt{k.d{n)  +  /n(l/e))  +Zn(l/e) 
call  EX(Z)  t  times 

let  E  be  the  set  of  instances  so  obtained; 
for  each  o  e  0  and  each  ze  £  do 

run  on  o(x).  repeating  to  a  confidence  of  (l-e/kt). 
if  solves  o(^x)  in  /-1  steps  then 
5(o)  =  5(o)u{j:) 
od 

for  each  o  e  <9  do 
V{0)  =  Q(S(o)): 

if  Q  is  randomized,  repeat  to  confidence  of  (l-e) 

od 

constnjct  the  algorithm  of  section  3  using  the  newly  computed  values  of 
the  v^(o)'s.  Call  this  algorithm  Hj. 

od 

Section  3: 

Algorithm  H 
Input  j::  string; 
begin 

o  <-  null-sequence  ; 

While  Gix)  *  1  do 

let  =  (oLt  6  ^(0)) ; 

if  0^  is  empty  then  halt  and  report  failure, 
else  pick  o  in  uniformly  randomly. 

X  <—  o{x)\ 
a  «—  o  (3  \ 

end 

output  a  as  solution  for  x: 
end 

Output  as  an  approximate  algorithm  for  d 
end 


We  will  prove  the  above  meta-algorithm  correct  in  stages.  First  we  consider  Section  1.  The 
estimation  here  is  to  be  done  exactly  as  in  Section  1  of  Meta-Algorithm  1 ,  and  the  corresponding  proof 
holds. 
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We  now  consider  Sections  2  and  3  simultanously.  We  proceed  by  induction,  with  the  following  being 
our  inductive  hypothesis.  To  simpity  the  proof,  let  us  assume  that  our  estimate  n  for  is  to  a  confidence 
of  unity.  We  will  account  for  this  at  a  later  stage. 

Inductive  Hypothesis:  In  any  run  of  the  meta  algorithm,  with  probability  (!-£)■*' 

^P(x}  >  (l-e)'  eqn(l) 

X  e  .V 

where  5  =  is  correct^  on  x)  and  is  the  conditional  distribution  given  by 
f  (,t)  =  pr{x  is  produced  by  any  call  of  EX(/)  |  xe  /(n)]. 


Basis:  For  /  =  0:  //g  produces  the  empty  sequence  as  solution  for  the  set  (,ri.  G(x)  =  1 1  and  fails  on 
all  other  inputs.  Hence  Xxe  5q^oW  =  1.  and  the  inductive  hypothesis  is  satisfied  for  i  =0. 

Induction:  Assume  that  the  inductive  hypothesis  is  true  for  (l-l)  and  prove  true  for  t. 


Let  S;(o),  5,_|((9),  v/o),  v^,_i(o)  represent  the  sets  S(o)  and  \'(o)  for  operator  o  at  the  end  of  iterations  / 
and  l-l  respectively  of  the  outer  for  loop  in  the  meta-algorithm.  'Jow,  consider  the  following  algorithm. 

Algorithm  //',. 

Input  x:  string; 
begin 

let  Oj  =  (dU  e  vyo)}; 

If  0^  is  empty  then  halt  and  report  failure, 
else  pick  0  in  0^  uniformly  randomly. 

X  ^  o(x). 

run  //;_[  on  x. 

if  //,_!  solves  X  with  solution  o 
output  CTO  and  halt, 
else  report  failure. 

end 


fp  is  different  from  //,  in  that  if  uses  the  Vfs  lor  deciding  only  on  the  first  operator  in  the  solution  of 
an  input  instance  x.  After  that  it  runs  //;_).  By  the  inductive  hypothesis,  can  be  as  inaccurate  as 
n-E)'.  Hence,  fP  cannot  do  better  than  that.  The  important  thing  is  that  it  is  possible  to  choose  the 
v^iofs  from  F  so  that  this  accuracy  is  attained.  To  see  this,  recall  that  F  contains  the  projection  of  O  -  the 
UioYs.  And  choosing  V(o)  =  U(o)  for  each  o  will  satisfy  our  demands.  Furthermore,  since  the  probability 
distribution  A,  is  non-zero  only  on  instances  of  length  n  (and  the  null  instance),  it  follows  that  we  could  just 
as  well  pick  ^(0)  =  U{o)nir.  That  is,  we  could  pick  V{o)  from  F^  rather  than  from  F. 


^By  this  we  mean  that  solves  x  with  probability  >  1/2  if  x  is  solvable. 
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We  will  now  show  how  to  construct  good  approximations  to  the  (7(o)ol'''s  so  that  the  inductive 
hypothesis  may  stand.  Consider  tr.  For  a  given  there  are  iF^i  ways  to  choose  each  of  the  k  sets 
vfo).  and  hence  there  are  at  most  \f/  choices  for  IT  Call  a  choice  "bad"  if  it  does  not  satisfy  eqn(i)  of 
the  inductive  hypothesis.  We  wish  to  eliminate  the  bad  choices.  To  do  so,  we  will  call  EX(/),  so  that  if  our 
current  choice  is  bad,  EX(7)  will  produce  a  witness  to  this  with  high  probability.  That  is,  EX(7)  will  produce 
an  instance  ,r  such  that  x  is  not  in  Vf^o)  for  any  o.  and  yet  there  exists  o,  such  that  o/x)  can  be  solved  by 
in  /-I  steps  Now,  at  any  call  of  EX(/),  given  that  the  call  resulted  in  an  instance  x  e  /(n),  the 
probability  that  a  bad  choice  of  tP  will  be  correct  on  the  instance  produced  is  at  most  (l-e/  if  we  make  .f 
calls  of  EX(0,  given  that  all  of  them  resulted  in  instances  from  Kn).  the  probability  that  a  bad  choice  of  //* 
will  be  correct  on  all  5  instances  is  at  most  (1-e)'^.  Hence,  the  probability  that  any  bad  choice  of  IP  will  be 
correct  on  all  5  instai^ces  is  bounded  by  We  choose  i  so  that  the  probability  of  the  above 

event  is  at  most  e.  That  is,  we  choose  5  so  that 
il-e)'^IFJ*  <  e. 

It  certainly  suffices  to  pick  5  to  satisfy 
s  >  l/(£)(iW(n)  +  //i(l/£)),  where  d{n)  is  the  dimension  of  F. 

But  by  our  choice  of  n,  the  probability  that  any  call  of  EX(/)  will  result  in  an  instance  from  /(«)  is  only  (l-£. 
Hence,  we  will  call  EX(/)  t  times,  for  some  t>s  so  that  with  probability  (l-£),  these  t  calls  will  result  in  at 
least  j  instances  from  /(n).  A  simple  Chernoff  estimate  yields  that  if  t  should  satisfy  t/tn(i)  >  s+ln(l/e). 
Such  a  choice  would  imply  that  witti  probability  (l-£)^,  we  have  eliminated  the  bad  choices  for  H',  i.e,  with 
probability  (l-£)^,  //*  satisfies  eqn(l),  given  that  satisfies  eqn(l). 

We  also  have  to  account  for  verifying  these  witnesses.  That  is,  given  an  instance  x,  for  each 
operator  o.  we  must  run  Hi_^  on  o{x).  Since  is  randomized,  it  has  a  certain  probability  of  failure  and 
this  must  be  accounted  for.  To  do  so,  we  run  sufficiently  many  times  so  that  our  confidence  in  the 
result  is  (T-£/i!:r).  This  will  require  0{ln{ktlt))  repetitions.  Since  we  must  run  on  kt  inputs,  our 
simultaneous  confidence  in  the  results  of  all  the  kt  computations  is  (l-E/ir)*',  which  is  bounded  by  (l-£). 
Finally,  we  not-  -^at  picking  a  candidate  V{o)  from  is  done  with  the  ordering  Q,  which  may  be 
randomized.  We  carry  out  this  computation  to  a  confidence  of  (l-s/k)  for  each  operator  O,  leading  to  a 
confidence  of  (l-£//t)*>  (1-e)  for  all  the  k  operators.  Combining  the  above  estimates  with  the  result  of  the 
last  paragraph,  we  conclude  that  with  probability  (l-e)‘*,  IP  satisfies  eqn(i),  given  that  satisfies 
eqn(l).  By  the  inductive  hypothesis,  satisfies  eqn(1)  with  probability  Therefore,  IP 

satisfies  eqn(i)  with  probability 

(l_e,4r/-i)(i_£)4^(l_£)4; 

Then,  since  5,_,(o)  c  5,(o)  for  each  0,  it  follows  from  the  definitions  of  Appendix  A^hat  V',_i(o)  c  k’/(o). 
This  directly  implies  that  the  set  of  instances  solved  by  W*  is  a  subset  of  the  set  of  problems  solved  by  H,. 
Therefore,  satisfies  the  inductive  hypothesis  as  well. 


^Condition  (b)  of  the  definition  of  ordering  Q,  Appendix  A. 
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We  now  seek  to  twund  the  error  of  with  respect  to  the  natural  distribution  P.  Specifically,  we 
seek  a  lower  bound  on  the  following  quantity. 

T 

where  =  [x\H^  is  correct  on  x) . 

Let  .V  be  the  set  of  instances  that  are  not  solvable. 

.v=  (xlx  is  not  solvable). 

We  define  the  following  sets,  parametric  in  /,  with  respect  to  //;. 

.V^  =  (xixe  /(n),  optimal  solution  of  x  has  /  steps,  //,  solves  x) 

=  (xioptimal  solution  of  x  has  fewer  than  /  steps  orx  is  not  solvable). 

Z;  =  (xioptimal  solution  of  x  has  more  than  /  steps). 

Also,  for  an  instance  x,  define  the  event  B(x)  as  follows. 

Bix)  =  (x  is  an  intermediate  step  in  the  solution  produced  by  INSTANCE) 


Now  consider  the  sum  Xxe  decompose  this  sum  as  follows. 

£  cPf^x)  =  y  P(x)+  y  /’r{B(x))+  ^  P(x). 

In  the  above,  c  is  a  normalization  factor  to  account  for  the  fact  that  P^  is  conditional  on  those  instances 

that  are  in  /(«).  By  our  choice  of  n>n^,  (recall  that  we  are  still  under  the  assumption  that  our  estimate  of 

n^is  of  confidence  unity),  this  normalization  factor  satisfies  c  <  (1-e).  To  see  this,  simply  note  that 

/(n  S  l-e.  by  the  definition  of  n^.  By  the  definitions  of  5(x),  X,  and  Z,, 

y  Pr(fl(x))  <  ^  /’(x)  e<7n(:) 

xs  le  Z, 


Therefore, 


y  f>r{fl(x))+  ^  P(x)  <  ^  /’(x)+  y  P(x)  <  1.  eqnO) 

x€T;_,  xeK,  xeZ,  xtl-, 

Summing  ^xs  ^  =  0,1,2.. .m  and  substituting  eqn  (3)  in  the  sum  (m-1)  times  we  obtain, 

SSxe5,  S5xrv,  X€;V 

Using  eqn{2)  to  replace  the  second  term  on  the  right,  we  get 

l=m 

^  c/’/x)  <  y  y  /’(x)+  y  /’(x)+(m-i)+  ^  PCx) 

xe  Sf  l^xeXi  xeW 

But  by  our  choice  of  m.  with  probability  (l-a),  Xx?  z  -  “•  Therefore  we  can  rewrite  our  inequality 

thus,  to  hold  with  probability  (l-e). 


SS.  cP^x)  ^  X  Z  X 


eqn{'i) 


1=0  xe  S, 


/3  X6  X, 


Now,  by  the  inductive  hypothesis,  with  probability  (l-e)'*^ 
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X  Piix)  >  (1-ey 

I€  S, 

Hence, 

l=m 

y  £  >  m(l-er  e(?n(5) 

3  tE  5^ 

Noting  that  eqn(4)  and  eqn{5)  hold  with  probability  (1-a)  and  probability  (l-e)'*"'  respectively,  we  can 
substitute  eqn(5)  in  eqn(4)  to  write:  With  probability  (l-£)'^'"(l-a) 

/=m 

cm(l-e)'"  ^  ^  ^  P(x)  +  a+(m-l)+  ^  P(x)  eqn(6) 

1=0  xe  X I  x€  N 

Grouping  the  first  and  last  terms  on  the  right  hand  side  and  substituting  c  >  (1-e),  we  get, 

P(x)  >  ffi(l-e)(l-e)'"-a-(m-l)  eqn{l) 

ze  5 

Where  5  =  [x\H^  is  correct  on  x].  We  desire  the  quantity  on  the  right  hand  side  to  be  greater  than  (l-l/Zi). 
Simplifying,  we  find  that  e  <  l/(2hm^)  suffices. 

Finally,  we  estimate  our  confidence  that  eqn(7)  holds.  Under  the  assumption  that  our  estimate  n  for 
was  to  unit  confidence,  we  obtained  the  confidence  estimate  of  (l-a)(l-ej‘'^  as  noted  with  eqn(6). 
Since  the  confidence  in  our  estimate  of  is  only  (l-e),  the  overall  confidence  that  eqn(7)  holds  is 
(j_g)4m+2)  yve  negg  tg  check  whether  our  choice  of  e  <  Milhir?)  is  sufficient  to  ensure  that  this 
confidence  level  exceeds  (I-I//1).  As  it  happens,  this  is  the  case. 

We  have  therefore  proved  that  A  is  indeed  a  meta-algorithm  for  M.  • 

5.  Conclusion 

This  paper  explored  a  new  direction  in  the  formal  theory  learning  -  algorithms  that  learn  to  solve 
problems  from  sample  instances  of  the  problems.  Two  random  sources  of  sample  instances  are 
considered,  one  providing  solved  instances  and  the  other  providing  unsolved  instances  or  exercises.  For 
both  sources,  general  theorems  are  proved  identifying  conditions  sufficient  to  permit  learning.  To 
illustrate  the  scope  of  these  results,  the  are  applied  to  the  construction  of  an  algorithm  that  learns  to 
perform  a  restricted  versions  of  symbolic  integration. 

6.  References 

[1]  Valiant,  L.G.,  "A  Theory  of  the  Learnable",  Symposium  on  Theory  of  Computing,  1984. 

(2]  Blumer,  A.,  Ehrenfeucht,  A.,  Haussler,  D.,  and  Warmuth,  M. , "Learning  Geometric  Concepts  and  the 
Vapnik-Chervonenkis  Dimension",  Symposium  on  Theory  of  Computing,  1986. 


16 


[3]  Natarajan,  B.K.,  "On  Learning  Boolean  Functions",  Symposium  on  Theory  of  Computing,  1987. 

[4]  Natarajan,  B.K..  and  Tadepalli,  P.,  ,  "Two  New  Frameworks  for  Learning",  Int.  Conf  on  Machine 
Learning,  1988 

[5]  Kearns,  M,,  Li,  M.,  Pitt,  L.,  and  Valiant,  L.G.,  "On  Learning  Boolean  Formulae",  Symposium  on  Theory 
of  Computing",  1987. 

[6]  Mitchell,  T.M.,  Keller,  R.,  Kedar-Cabelli,  S..  Machine  Learning,  VoU,  1986. 


Appendix  A 

This  section  reviews  some  necessary  definitions  and  results  on  learning  families  of  sets  with  one¬ 
sided  error  as  presented  in  (3). 

Let / denote  a  subset  of  I*  and  F  be  a  family  (a  set)  of  such  sets. 

Defn:  A  family  of  set  F  is  polynomial-time  leamable  with  one-sided  error  if  there  exists  an  algorithm 

.4  and  an  integer  k  such  that 

(a)A  takes  as  input  integer  h,  the  error  parameter. 

{b),4  may  call  EXAMPLE,  where  EXAMPLE  returns  randomly  drawn  elements  of  some  set  /  in  F. 

These  elements  are  drawn  according  to  an  arbitrary  and  unknown  probability  distribution  P  on  /. 

A  computes  in  time  (/tO*,  where  I  is  the  length  of  the  longest  example  produced  by  EXAMPLE,  a 
may  be  randomized. 

(c)For  all/in  F  and  all  probability  distributions  P  on  these  sets /,  with  probability  (I-I//1)  A  outputs  a 
program  C  that  runs  in  time  on  inputs  of  length  n  and  accepts  a  set  g  in  F  such  that  gc/  and 

Prob[f-g]  <  Mh. 

Defn:  Let  /  s  Z*.  For  natural  number  n,  the  induced  set  /„  is  defined  by  /„  =  [xLce  /,  W<«). 
Similarly  F„  =  {/„/e  F). 

Defn:  The  dimension  of  a  family  F  is  d{n)  if  for  all  n,  \FJ  <  If  din)  is  a  polynomial  in  n,  we  say  F 
is  of  polynomial  dimension. 

Defn:  An  algorithm  Q  is  said  to  be  a  polynomial-time  ordering  tor  family  F  if  there  exists  an  integer  k 
such  that 

(a) Q  takes  as  input  a  set  of  strings  S.  Q  outputs  a  program  C  such  that  C  accepts  a  set/  in  F,  5  c  /. 

Also,  for  all  g  in  F,  5  c  g  implies /  c  g- 

(b) Bofh  Q  and  C  run  in  (possibly  randomized)  time  /*on  inputs  of  length  l. 


Theorem  A:  A  family  F  Is  polynomial-time  learnable  with  one-sided  error  If  unu  only  if  r  is  of 
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polynomial  dimension,  F  is  closed  under  intersection,  and  F  possesses  a  polynomial-time  ordering. 


Proof:  See  [3]  for  details.  • 


